Engineering posts about Distributed Tracing
Curated summaries and key learnings for engineers working with Distributed Tracing.
Using observability data to prevent incidents
The article emphasizes the importance of using observability data to transition from reactive incident response to proactive reliability intelligence. It outlines how engineering teams can leverage...
Observability for any agent, anywhere: Production-ready tracing with OpenTelemetry & Unity Catalog on Databricks
The article discusses the challenges of traditional observability tools in managing the massive volumes of trace data generated by AI agents. It presents a solution through Databricks' integration...
Monitoring reliably at scale
The article outlines the challenges of maintaining reliable observability in systems that are heavily dependent on shared infrastructure, such as Kubernetes and service meshes. It highlights the...
From Custom to Open: Scalable Network Probing and HTTP/3 Readiness with Prometheus
The article outlines Slack's transition to HTTP/3 and the challenges faced due to the lack of client-side observability with existing monitoring tools. It highlights the development of QUIC support...
It Wasn’t a Culture Problem: Upleveling Alert Development at Airbnb
The article outlines Airbnb's transformation of its Observability as Code (OaC) alert review process, which significantly reduced development cycles from weeks to minutes. By implementing a system...